fetch_bulk_block_deals.py - ChartsMaze EDL Pipeline

Overview

Fetches bulk and block deal data for the last 30 days from the Dhan Static ScanX API. Due to API limitations (max 10-day date range per request), the script fetches data in three 10-day chunks and deduplicates results. Source: fetch_bulk_block_deals.py
Phase: Phase 2 (Enrichment)
Output: bulk_block_deals.json

API Endpoint

POST https://ow-static-scanx.dhan.co/staticscanx/deal

API Limitations

The API enforces a maximum date range of 240 hours (10 days) between start and end dates. Requests exceeding this will fail.

Request Payload

data.startdate

string

required

Start date in DD-MM-YYYY format

data.enddate

string

required

End date in DD-MM-YYYY format (max 10 days from startdate)

data.defaultpage

string

default:"N"

Whether to use default pagination

data.pageno

int

required

Page number (starts at 1)

data.pagecount

int

default:"50"

Number of results per page

Example Request

{
  "data": {
    "startdate": "01-01-2024",
    "enddate": "10-01-2024",
    "defaultpage": "N",
    "pageno": 1,
    "pagecount": 50
  }
}

Function Signature

def fetch_bulk_block_deals():
    """
    Fetches bulk/block deals for the last 30 days in 3 chunks of 10 days each.
    Auto-paginates through all pages for each chunk.
    Deduplicates results and saves to bulk_block_deals.json.
    """

Pagination Logic

The script auto-paginates based on totalcount from API response:

page_no = 1
max_pages = 1  # Will be updated from first response

while page_no <= max_pages:
    # ... make request ...
    
    total_count = data.get('totalcount', 0)
    if total_count > 0:
        max_pages = math.ceil(total_count / 50)
    
    page_no += 1

Date Chunking Strategy

Fetches 30 days in three 10-day chunks:

for i in range(3):
    days_offset_end = i * 10 
    days_offset_start = days_offset_end + 9
    
    chunk_end_date = end_date_ref - timedelta(days=days_offset_end)
    chunk_start_date = end_date_ref - timedelta(days=days_offset_start)
    
    start_str = chunk_start_date.strftime("%d-%m-%Y")
    end_str = chunk_end_date.strftime("%d-%m-%Y")

Example:

Chunk 1: Days 0-9 ago (most recent)
Chunk 2: Days 10-19 ago
Chunk 3: Days 20-29 ago

Deduplication

Results are deduplicated using a composite key to handle overlapping chunks:

unique_deals_map = {}
for d in all_raw_deals:
    key = f"{d.get('sym')}_{d.get('date')}_{d.get('qty')}_{d.get('avgprice')}_{d.get('bs')}_{d.get('cname')}"
    unique_deals_map[key] = d

sorted_deals = sorted(list(unique_deals_map.values()), key=lambda x: x.get('date', ''), reverse=True)

Output Structure

sym

string

Stock trading symbol

date

string

Deal date

qty

number

Quantity traded

avgprice

number

Average deal price

string

Buy (B) or Sell (S) indicator

cname

string

Client/counterparty name

Additional Fields

The API response may include additional fields like:

Deal type (Bulk/Block)
Exchange
Remarks

Example Output

[
  {
    "sym": "RELIANCE",
    "date": "2024-01-15",
    "qty": 5000000,
    "avgprice": 2450.50,
    "bs": "B",
    "cname": "ABC INVESTMENTS LTD"
  }
]

Dependencies

requests — HTTP client
json — JSON parsing
datetime — Date range calculations
math — Pagination math (ceil)
pipeline_utils.py — Provides get_headers() function

Configuration

Hardcoded configuration:

url = "https://ow-static-scanx.dhan.co/staticscanx/deal"
headers = get_headers()
end_date_ref = datetime.now()  # Today
page_size = 50
chunks = 3  # 3 x 10-day chunks = 30 days

Error Handling

10-second timeout per request
Breaks pagination loop on HTTP errors or empty responses
Continues to next chunk if one chunk fails
Prints detailed progress and error messages

try:
    response = requests.post(url, json=payload, headers=headers, timeout=10)
    if response.status_code == 200:
        # Process data
    else:
        print(f"  Error fetching page {page_no}: Status {response.status_code}")
        break
except Exception as e:
    print(f"  Exception fetching page {page_no}: {e}")
    break

Progress Tracking

print(f"Fetching deals for chunk {i+1}/3: {start_str} to {end_str}...")
print(f"  Fetched page {page_no}/{max_pages} ({len(deals)} items)")

Usage Example

python3 fetch_bulk_block_deals.py

Expected Output:

Fetching deals for chunk 1/3: 24-02-2024 to 03-03-2024...
  Fetched page 1/3 (50 items)
  Fetched page 2/3 (50 items)
  Fetched page 3/3 (25 items)
Fetching deals for chunk 2/3: 14-02-2024 to 23-02-2024...
  Fetched page 1/2 (50 items)
  Fetched page 2/2 (30 items)
Fetching deals for chunk 3/3: 04-02-2024 to 13-02-2024...
  Fetched page 1/1 (42 items)
Successfully saved 247 unique bulk/block deals to bulk_block_deals.json

Integration

This script is part of Phase 2 (Enrichment) in the EDL Pipeline. The output file is consumed by:

add_corporate_events.py — Adds ”📦: Block Deal” event markers for deals in last 7 days

Run via master pipeline:

python3 run_full_pipeline.py

Performance Notes

Total API calls: ~3-10 requests (depending on deal volume)
Typical runtime: 5-10 seconds
No threading (sequential pagination required)
Data volume: Typically 200-500 deals per month

​Overview

​API Endpoint

​API Limitations

​Request Payload

​Example Request

​Function Signature

​Pagination Logic

​Date Chunking Strategy

​Deduplication

​Output Structure

​Additional Fields

​Example Output

​Dependencies

​Configuration

​Error Handling

​Progress Tracking

​Usage Example

​Integration

​Performance Notes

Overview

API Endpoint

API Limitations

Request Payload

Example Request

Function Signature

Pagination Logic

Date Chunking Strategy

Deduplication

Output Structure

Additional Fields

Example Output

Dependencies

Configuration

Error Handling

Progress Tracking

Usage Example

Integration

Performance Notes